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1 Introduction 

feret g aret Si W m °] ntl l TT °"I jT C0 " Centrated °" three, somewhat dif- 
rent areas. We looked at and developed a number of error concealment 

ernes oi use in a variety of video coding environments. This work is de- 

z: t r:^°TT ing (draft) Masters thesis - 1,1 the ^ ^ribe 

application of this techniques to the MPEG video coding scheme We felt 

bet challeneet ° rdCTi " g app, '° ach USed in the MPEG would 

challenge to any error concealment/error recovery technique. 

We continued with our work in the Vector Quantization area. The work 
n recursively indexed Vector Quantization will be reported in a PhD dis- 
sertation during the current period. We have also developed a new type of 

stn°Pr Q ed an , "VO ’ ' V<! ^ Pnd ‘ M ”‘ Sector Quantization. The 

Scan Predictive VQ was tested on data processed at Goddard to apnroxi 

mate Landsat 7 HRMSI resolution, and compared favorably „i h extt a 

VQ techniques. A paper describing this work is included with this report 

The paper has been submitted to IEEE Transactions on Image Processing.' 

The third area is concerned more with reconstruction than compression 
While there ,s a variety of efficient lossless image compression schemes they 
all have a common property that they use past data to encode future ’data' 
This ,s done either via taking differences, context modeling or by buildtg 
dwtionaries. When encoding large images, this common property Lcomes f 

Z ZZ remen ST, % "" 7^ *° de “ d < a p ”“™ oUhe^et 
requirement that the past history be available forces the decoding of a 

significantly larger portion of the image than desired by the user Even with 

intelligent partitioning of the image dataset, the number of P xelfdecod S 

may be four times the number of pixels requested. We have detloped an 

t i hich u an be used with “>■ ioss,ess 

to about 7% !f f ' V T the f add, , t,0nal number of P“ eIs decoded 

results is included n ^sTeport'^Th^ A paper describing these 

'KtematTonal fV • — - f P £ Thls work will be reported at the 1994 

international Geoscience and Remote Sensing Symposium. 

During this period, the following paper appeared in print 

“Coding of Color-Mapped Images,” (with A.C. Hadenfeldt), IEEE Trans- 
act, ons on Geosaence and Remote Sensing, vol. 32, pp. 534-541, May 1994. 
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A copy of the paper is included with this report. 

Also during this period, the following paper was accepted for publication 

“A Constrained Joint Source Channel Coder Design,” (with F. Liu and 
J.D. Gibson), to appear in IEEE Journal on Selected Areas of Communi- 
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Abstract 


In packet switched networks, even with error correction protection, packet loss is 
unavoidable. Hence a method of error recovery, which takes the characteristics of the 
video signals into account, is required. This process, known as Error Concealment, 
attempts to recover or reconstruct the missing blocks from the structured picture 
data. A method of error concealment based on the motion estimation is proposed 
in this paper. This method makes use of the fact that most of the frames look 
alike (excepting during the shifts) and hence uses the past as well as the future 
information to reconstruct the missing information in a frame (in addition to the 
information from the same frame). 

The underlying assumptions in this method of error concealment are 

• pixels in the image are much smaller than any of the important details 


• most of the pixels’ neighbors represent the same structure. 


Chapter 1 


Introduction 


For multimedia communication and information services, the evolution of asyn- 
chronous transfer mode (ATM) networks based on packet switching represents the 
flexibility and freedom in maintaining the quality of these services Packet switched 
networks were originally invented for carrying burst-type data as it was uneconom- 
ical to use continuously connected circuit. In conventional circuit switched con- 
nections a dedicated path is established and a bandwidth is assigned in advance. 
Quality of service would degrade if the channel capacity were to exceed, while the 
excess channel capacity would be wasted if the output rate of the source were less 
than the channel capacity. Packet video is a relatively new field and has attracted 
a lot of attention. As image information representing detail, motion, etc. varies, 
variable bit rate (VBR) coding tailored for packet switched networks can be utilized 
to maintain constant image quality. Also by channel sharing among multiple video 
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sources, transmission efficiency could be improved. 


The flexibility of packet switching provides new opportunities for video com- 
ication and at the same time it also presents new challenges. The problems 
inherent in this scenario are packet loss and packet delay. The former can be due 
to random bit errors in the packet destination address and heavy traffic at certain 
nodes in ATM network, while the latter is caused by holding of packet at any of the 
switching nodes until a slot is open, resulting in a differential transmission delay be- 
tween packets. The differential delay causes problems in timing relationship between 
video generation and reconstruction for display. Hence it is necessary to incorporate 


error correction protection into coding techniques compatible with packet video. A 
simple method to incorporate error protection scheme is to generate parity packets 
containing parity bits generated from the information packets. A lost packet could 
hence be recovered by initiating error protection at the receiver. This scheme how- 
ever increases the rate and hence contributes to packet loss. It has been observed 
however, that the error correction ability of the error protection system more than 
makes up for the packet loss introduced by the increased rate. 

In packet switched networks, even with error correction protection, packet loss 
is unavoidable and a correction method is required for video packet transmission, 
which takes characteristics of video signals and video coding into account. In this 
method the receiver detects the damaged picture caused by the lost packet and 
performs error concealment. This thesis proposes an error concealment method for 
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video packets lost during transmission. 

Chapter 2 describes various image compression techniques. Broadly these are 
classified into spatial domain techniques and frequency domain techniques. Advan- 
tages and disadvantages of each of these techniques are discussed in this chapter. 

Chapter 3 describes various components of a basic video codec suitable for packet 
video. It also briefly describes how error correction protection can be incorporated 
into the coding algorithm. 

Chapter 4 presents various error concealment and reconstruction algorithms. A 
new algorithm for error concealment based on estimation motion from frames both 
in the past and the future is presented. The performance of this method together 
with the results and its limitations on two motion sequences are presented. 


Chapter 5 presents the conclusion. 


Chapter 2 


Understanding Image 
Compression 

The goal of data compression of images is to reduce transmission and storage costs. 
To achieve compression it is necessary to consider representations beyond simple 
analog to digital conversion of image data. Several other factors such as high corre- 
lation between adjacent pixels have to be taken into consideration while compress- 
ing images (spatial domain compression). In addition, correlation between adjacent 
frames, has to be taken into consideration while compressing motion sequences. In 
this chapter we discuss various image coding techniques mostly applicable to moving 
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2.1 Understanding Digital Images 

It has been known for quite some time that a wide spectrum of colors can be gener- 
ated from a set of three primaries: red, yellow and blue. Television displays generate 
colors by mixing lights of the additive primaries. The color space obtained through 
combining the three colors can be determined by drawing a triangle on a special 
color chart with each of the base colors as an endpoint. This classic color chart was 
established by Commission Internationale de L’Eclairage (CIE). 

One of the special concepts introduced by 1931 CIE chart was the isolation of 
luminance (brightness) from chrominance (hue). Using the guidelines of CIE the 
National Television System Committee (NTSC) defined picture transmission in the 
form of luminance and chrominance components. The new color space was labeled 
YIQ, where the Y stood for the luminance component while the I and the Q stood 
for the in-phase component and quadrature component of chrominance respectively. 

In Europe two television standards later emerged, Phase-alternation-line (PAL) 
format and Sequentiel couleura and memoire (SECAM) format, both with identical 
color space, YUV. The difference between the PAL/SECAM YUV color space and 
the NTSC YIQ color space is a 33 degree rotation in UV space. The YUV format 
as well as the YIQ format concentrates most of the image information into the 
luminance and less into the chrominance. The result is that each of the individual 
components can be coded individually without much loss of efficiency. 
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2.1.1 File Formats 

There are two formats for PAL-style input: CIF, representing an input file of 
352x288 for the luminance and 176x144 for the chrominance components; and 
QCIF, representing an input file of 176x144 for the luminance and 88x72 for the 
chrominance components. For NTSC images the most common input style is the 
CIF-style which represents an input file of 352x240 for the luminance and 176x120 
for each of the chrominance components. 


2.2 Intra Frame Processing 

2.2.1 Spatial Domain Methods For Image Compression 

In this section we consider digital coding techniques that operate on the data in the 
spatial domain. 

Pulse Code Modulation (PCM) 

In pulse code modulation the incoming video signal is sampled and quantized. Hence 
it is just a digital representation of the original analog signal. This method of coding 
does not consider the inter-pixel correlation while coding the image sequences. 
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Predictive Coding 

In PCM, successive inputs to the quantizer are treated independently, so there is no 
exploitation of the significant redundancy present in images. 

The philosophy behind predictive coding is to remove redundancy between suc- 
cessive samples of input data and to quantize only the new information. The most 
common example of a predictive coding system is Differential Pulse Code Modu- 
lation (DPCM). In DPCM, the difference between successive samples is quantized 
and transmitted as opposed to other coding schemes where the original samples are 

quantized. This scheme works well for images since there is a lot of correlation 
between adjacent pixels [1]. 

The basic equations describing DPCM are, (see Figure 2.1) 


d(n) — 

x(n) — i(n) 

(2.1) 

u(n) ~ 

d(n ) - q(n) 

(2.2) 

and y(n ) = 

x(n) + v{n ) 

(2.3) 


where y(n) is the DPCM approximation to coder input x(n), d(n) is the prediction 
error, tf(n) is the quantization error, u(n ) is the quantized prediction error, and v(n) 

is the quantized prediction error which may have been corrupted by channel noise 

[ 2 ]- 
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Figure 2.1: Block diagram of DPCM: (a) Coder and (b) Decoder 
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2.2.2 Transform Coding 


An alternative to predictive coding is transform coding. Discrete Cosine Transform 
(DCT) is the most commonly used transform coding technique in which square 
subregions in the image are processed with a discrete cosine transform. Conceptually 
a one dimensional DCT can be thought of as taking the Fourier Transform of an 
infinite sequence (see Figure 2.2). For a spatial image /(*,„) the two- dimensional 
discrete cosine transform u) is given by 


F(u, V) = £ £ f(ij) cos ( ft + p™ cos Qy+ p,» 

1=0 j= o ^ 2 N 


N - 1 N-\ 


(2.4) 


The inverse transform is given by 


N- 1 N - 1 

f(u,v)= J2 EC(u)C(v)F(u,v)cos^i±^ C o S (M±2^ 

2N 2 N 


u — 0 0 


(2.5) 


where 

I 1/7(2) for w = 0 

C[w) -- < 

1 for in = 1,2, • • •, AT — 1 

where N is the width of the image block, the range for u and v is from 0 to N-l. 
The reason behind using DCT is that for correlated or low frequency sources the 
DCT tends to concentrate the energy into a very few coefficients. The coefficients 
containing most of the energy can be used to approximate the source output. Images 

tend to have large regions of low spatial frequency. This makes the DCT a very useful 
transform to use with images. 
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DCT-Based Compression 

Figures 2.3 and 2.4 show the key processing steps which are the heart of the DCT- 
based models of operation. These figures illustrate the compression of a grayscale 
image. As can be seen from the Figure 2.3 each of the 8 X 8 blocks of the image 
makes its way through each of the processing steps and gets compressed. Color im- 
age compression can be thought of as multiple grayscale images being compressed 
entirely (i.e., all the componenets) or one at a time. 

The DCT is related to the Discrete Fourier Transform (DFT) [3]. Each of the 

NXN block is a N 2 point discrete signal. The FDCT takes such a signal as its input 

and decomposes into N 2 orthogonal basis signals. The DCT coefficient values can 

thus be regarded as the relative amount of the 2D spatial frequencies contained in 

the Appoint input signal. The coefficient with zero frequency in both dimensions 

is called the DC coefficient” and the remaining coefficients are called the ”AC 
coefficients”. 

At the decoder the IDCT reverses this processing step. It takes N 2 DCT coef- 
ficients and reconstructs the AxN block. In principle, the DCT introduces no loss 

to the source image samples; it merely transforms them to a domain in which they 
can be more efficiently encoded. 
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Figure 2.4: DCT-Based Decoder Processing steps 
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2.3 Inter Frame Processing 

The amount of compression possible by spatial processing alone is limited. A very 
high degree of temporal correlation exists whenever there is little motion in the 
scene. Even if there is movement, high correlation may still exist depending on the 
spatial characteristics of the image. 

2.3.1 Motion Compensation Estimation 

In any temporal compression scheme the signal is compressed by first predicting how 
the next frame will appear and then sending the difference between the prediction 
and the actual image. A reasonable prediction would be the previous frame. This 
sort of temporal differential encoding is very similar to Differential Pulse Code Mod- 
ulation (DPCM) and performs very well when the motion between adjacent frames 
is insignificant. If there is significant motion however this scheme would perform 
worse than if the next frame had simply been coded by itself. 

Motion compensation and estimation is a process of improving the performance 
of any temporal compression scheme when motion occurs. In this procedure, dis- 
placement needs to be calculated between the previous frame and the present frame. 

If this information is known at the decoder site, then the previous frame can be 
shifted or displaced in order to obtain a more accurate prediction of the next frame 
that has yet to be transmitted. Motion displacement could be generated on a frame, 
partial frame or a pixel basis. Motion vectors (displacement) are generally calculated 
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on a partial frame basis (with the area of the portion chosen to equal a superblock) 
since, it would be too expensive (to calculate the motion vector at the encoder and 
provide the information to the decoder) on a pixel basis while, it is not very useful 
to generate a single motion vector for an entire frame. The dimensions of the super 
block vary from implementation to implementation. 

The process of displacing portions of a previous frame in order to predict the 
next frame is shown in figure 2.5. 

2.4 Rate Buffer Control 

Since the output rate for channel transmission is fixed while the data is variable 

length huffman coded, it is necessary to rate buffer control the output. This rate 

buffer is normally implemented as a one frame FIFO (First In First Out) after the 
huffman coder. 

The FIFO input rate is continuously monitored and the quantization level is ad- 
justed to prevent buffer overflow or underflow. As the quantization level is decreases 
the block length increases and the FIFO input rate increases while an increase in 

quantization level causes the block length to decrease and hence the FIFO input 
rate to decrease. 
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Figure 2.5: Using Motion Compensation To Predict Next 






Chapter 3 

Designing A Video Codec 

^•1 A Basic Video Coder 

A basic video coder has five stages: a motion compensation stage, a transforma- 
tion stage, a lossy quantization stage, and two lossless coding stages. The motion 
compensation like DPCM takes the difference between the present image and the 
previous image if they are alike. The transform concentrates the information in 
a few coefficients, the quantizer is responsible for selecting the high energy DCT 
coefficients. The two coding stages compress the data close to their symbol entropy. 
This coding stage is considered lossy since the reconstructed image is not exactly 
the same as the original image (due to the quantizer). Lossless coders (without 
quantization stage) have been found to achieve very poor compression. 

The frame work of a basic video codec is given in the Figure 3.1 
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Figure 3.1: Block diagram of Video Codec: (a) Coder and (b) Decoder 
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3.1.1 Motion Compensation Estimation 
Since most frames 


“ ,mage 5e< > l " !nce lo ° k Mike (excepting during the shifts due 


to movement) the difference between th e blocks 


are coded rather than the blocks 


themselves. The motion compensation model for 
Figure 3.2 


a basic codec is shown in the 


The motion compensation model (shown in the Figure 3.2) separates the mo- 

ti0 " 5e<!UenCeS in, ° frames: intraframes. which are coded 

without any prediction: forward predicted frames, which are predicted based on the 
intraframes; bidirectionally predicted frames , which are predicted based on either 
the intraframes or the forward predicted frames. 

3.1.2 Transform Stage 

e transform stage is used to concentrate the energy into a few coefficients of the 
Mock. The image is normaUy divided into small Mocks to simplify the comp,exity 
of this stage. The transform method chosen by CCITT is the two dimensional 8 by 
8 DCT. The formula for two dimensional 8 by 8 DCT can be written as 


7 7 


F(u, v) = (1/4)C(«)C(»)££ cos (*±DUt cos QL + l hi 

*— o j=o 16 16 


(3.1) 
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Intra-frame 


Forward predicted frame 


(Priority: Intra > Forward > Bidirectional) 

Motion Compensation 


Bidirectionally predicted frame 


Figure 3.2: The Motion Compensation Model For A Simple Codec 
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C(x) = 


where 

( 

1 / v/( 2 ) for w = 0 

1 for w = 1 , 2 , •■•,7 

The output of 8 by 8 DCT is in such a way that the average value of the block (DC 

coefficient) is in the upper left corner. Progression from left to right represents the 
increasing number of vertical edges, while progression from top to bottom represents 
increasing number of horizontal edges. 

The inverse 2 D DCT can be written as 


/k«) = EE c w«v) cos — cos (3 2 ) 

u=o„=o 16 16 ' 


3.1.3 Quantization 


The DCT coefficients are quantized to increase the number of zero valued coeffi- 
cients. The DCT blocks are quantized with the DC and the AC terms separately. 
Quantization is the lossy stage in the coding scheme. The image quality deteriorates 
if the quantization is too coarse, while useless bits coding noise have to be spent if 
the quantization is too fine. 


3.1.4 Coefficient Scanning 

The quantized DCT coefficients are arranged in a zig-zag pattern (see Figure 3 . 3 ). 
Zig-zag pattern scanning arranges the DCT coefficients in ascending frequency order. 
The assumption behind zig-zag pattern scanning of DCT coefficients is that the 
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lower frequency components tend to have higher values than the higher frequency 
components, fn .mages the high frequency DOT coefficients are normally zero. 
Hence, zig-zag pattern scanning helps in accumulating zeroes towards the end of 
the block and helps in reduction of transmitted coefficients. 

DC coefficients are encoded by the number of significant bits followed by 
•he bits themselves, while the AC coefficients are encoded based on the number of 
zeroes before the next non-zero coefficient. 

The inverse run-length coder translates the coded stream into either a DC co- 

efficient or a run-length foUowed by an AC coefficient. The zero coefficients (based 

on the run length) ) are appended into the buffer followed by the non-zero AC 
coefficients. 


3.1.5 Entropy Coding 

The final processing step for a basic video codec is entropy coding. This step achieves 
additional (lossless) compression based on the statistical characteristics of the quan- 
oefficients. The most commonly used entropy coding scheme is Huffman 
coding. To compress data symbols, the Huffman coder creates shorter codes for fre- 
quently occurring symbols and longer codes for occasionally occurring symbols. 


AC Coefficient Start 
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3.2 Error Correction Protection 

The problems inherent in the packet video are packet loss and packet delay. It is 
therefore necessary for coding techniques compatible with packet video to consider 
these problems. 

A simple technique to protect the information packets is to add parity packets to 
the existing information packets. A lost or delayed packet could then be recovered 
by initiating error correction. This scheme however has a disadvantage in that it 
increases the number of packets transmitted and hence contributes to the loss of 
packets. However a good error correction scheme is suppose to more than make up 
for the loss of packets due to increased rate. 

A single error correcting (7,4) hamming code was implemented and incorporated 
with the video coder to protect the information packets. The decoder was modified 
to perform error correction only when one packet was lost. This scheme was found 
to work well due to following two reasons 

• the probability of losing a single packet is higher than the probability of losing 
more than one packet. Hence most of the time only a single packet is lost 
(even after taking the increased rate into consideration) 

• it does not corrupt the correct packets in the process of recovering the lost 


packets. 


Chapter 4 


Error Concealment 


error correction protection, packet loss 


In packet switched networks, even with 

unavoidable. Hence a method of error recovery, which takes the characteristics of 
video signals into account, is required. This process, known as error concealment, 
attempts to recover or reconstruct the missing blocks from the structured picture 

data. In this chapter various methods of error concealment are discussed. The 
underlying assumptions in a 11 these methods are 

• Pixels in the image are much smaller than any of the important details 

• most of the pixels' neighbors represent the same structure. 

A method of error concealment based on the motion estimation is proposed and the 
results are presented. 
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4.1 Intra Frame Processing 

4.1.1 Block Averaging 

This method of error concealment involves replacing the missing block in the frame 
by the average of the surrounding blocks. The basic assumption in this method 
of error concealment is that the neighboring blocks represent the same structure. 
The process of averaging the surrounding blocks to replace the missing block can 
be done either in the spatial domain (spatial averaging) or in the frequency domain 
(spectral averaging) . Both, spatial averaging and spectral averaging, perform well 
when the missing blocks are not at the edges. The figures (see Figures 4.1, 4.2) 
shows a frame (Susie sequence) obtained after performing Spectral and spatial error 
concealment on a frame in which 1 % of the blocks were randomly thrown away. 

4.2 Inter Frame Processing 

4.2.1 Block Replacement 

In this method the missing blocks in a frame are replaced by the blocks, in the 
corresponding location, from the previous frame (see Figure 4.5). This method 
would work well if there were not much motion between the two successive frames. 
Figures 4.6, 4.7 show two successive frames without significant motion between 
them. The missing blocks in the Figure 4.7 were filled by the blocks from the 
preceding frame (Figure 4.6). 
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Figure 4.3. Spatial Error Concealment (Football Sequence) 



Figure 4.4: Spectral Error Concealment (Football Sequence) 
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The missing blocks in Figure 4.9 were filled by the corresponding blocks from the 
previous frame (see Figure 4.8). This is an example where there is a significant 
motion between two successive frames. 

4.3 Error Concealment Model Based on Motion Esti- 
mation 

Since the frames of a video coded sequence are motion compensated, it is necessary 
to consider the type of the frame before performing error concealment. The motion 
compensation model for the video coded sequence is shown in Figure 3.2. This model 
divides a motion sequence into three different types of frames which are intraframes, 
forward predicted frames and bidirectionally predicted frames (refer to section 3.2). 

The flowchart of the model (see Figure 4.10) explains the various steps involved 
m the error concealment for different types of frames (I, B or P). This model takes 
advantage of the fact that most frames in a motion sequence look alike, and uses the 
information from the previous and/or next frames to fill up the missing blocks in 
the frame (present). This model estimates the motion of the present frame (frames 
with the blocks missing) with the previous and/or the next frame (depending on the 

type of the frame) and compares it with a threshold (T) (to take care of the scene 
changes or significant motion). 


Intraframes: 
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Figure 4.5: Replacing Blocks From Previous Frame 
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Y - Yes I . Id tra frame 

N - No P . Forward Predicted Prune 

B - BidirecciocaJJy Predicted 
Frame 


Figure 4.10: Flowchart of Error Concealment Model 
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The intraframes are error corrected before performing error correction on the forward 
predicted and bidirectionally predicted frames. These frames were reconstructed by 
estimating the motion with the previous forward predicted frame. 

Forward Predicted Frames: 

The forward predicted frames are error reconstructed using the information from 
the previous intraframe. Again motion is estimated between the two frames before 
replacing the missing blocks. 

Bidirectionally Predicted Frames: 

To reconstruct the missing blocks in the bidirectionally predicted frames the infor- 
mation in both the adjacent frames was used. Motion between the present frame 
and the previous frame and the present frame and the next frame was estimated. 
Missing blocks in the present frame was then replaced by performing a frequency 

domain interpolation between the blocks obtained from the previous and the next 
frame. 

The process of motion estimation involves taking the four blocks surrounding a 
missing block and moving through a predefined search space (starting with the same 
location) in the previous and/or the next frame and comparing the error against a 
threshold (T). The threshold (T) was found to depend on the activity in the block 
and variance (of the blocks surrounding the missing block) was found to be a good 
estimate of the activity in the region. 

The following Figures show the first twenty four frames of (original and recon- 
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structed) susie and football sequences (in which 10% of the packets were lost during 
transmission). 

The PSNR for the first twenty four frames before and after reconstruction for 
susie and football sequences is shown in Figures 4.11 to 4.26. A significant increase 
in PSNR values was observed with error concealment as can be observed from the 
graphs for the first twenty four frames. The artifacts due to error concealment were 
less observable in football sequence than the susie sequnce (even though the loss 
of packets is roughly the same) because of the fast motion of the objects. All the 
sequences used in this chapter are contained in an accompanying video tape for 
subjective evaluation. 
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Figure 4.12: Frames 13:24(Original Susie Sequence) 
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Figure 4 . 17 : Frames 1: ^(Reconstructed Football Sequence) 
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Figure 4.18: Frames 13:24(Reconstructed Football Sequence) 




























Chapter 5 


Conclusion 


Packet loss in packet switched networks is inevitable even with error correction 
as was mentioned before. Hence there is necessity for error concealment at the 
receiver. In this thesis a method of error concealment based on motion estimation is 
proposed. This method takes advantage of the fact that most frames look alike and 
hence uses the information from the previous and the next frames (where possible) 
to conceal the errors (missing packets) by estimating the motion. The method 
proposed was implemented together with a standard MPEG video codec and tested 
for its performance on two motion sequences (Susie and Football). For both the 
sequences a significant increase in PSNR was observed with error concealment. 

An interesting aspect of the proposed method is its response at scene cuts. It 
is obvious that the motion compensation of the previous frame in such cases is 
meaningless as the contents of the previous frame and the current frame are entirely 
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different. This method however does work better than other error concealment 
methods (which use only the past information to perform error concealment [4] ). 

In some frames (particularly where the motion was complicated) this method 
resulted in degradation of picture quality (as can be observed on the accompanying 
video). This however can be attributed more to the poor performance of the mo- 
tion estimation algorithm and the incorrect motion compensation than to the error 
concealment procedure. 
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1 Introduction 


Vector Quantization (VQ) has been found to be an efficient image compression technique 
due to its ability to approximate patterns in the source output [6], Conventional VQ based 
techniques partition an image into non-overlapping blocks which are then raster scanned and 
quantized. The codebooks are constructed from averages of similar patterns in a training 
set. Because of the way they are obtained the codebook vectors tend to be smooth. Even 
when explicit efforts are made to include high frequency vectors such as edge vectors, the 
number of such entries are limited by the relatively small size of the codebook [15], Therefore 
the likelihood of finding a good approximation to a smooth vector is much more than that 
of finding a good match to a high frequency or edge vector. While most blocks in an image 
do not contain edges, edges are perceptually very important, and coarse representations of 
these blocks can lead to a substantial degradation in perceptual quality. Furthermore, edges 
are very important in a number of image processing applications, such as classification and 
pattern recognition. Edge degradation can adversely effect such applications. 

solutions have been proposed to improve edge reproduction. Some of the more 
well known ones are Classified VQ, Finite-State VQ, and Predictive VQ [Z). In this paper we 
present an alternative solution to the edge degradation problem. Instead of trying to increase 
the number of codebook entries which more closely match the high frequency patterns in 
he input to the Vector Quantizer, our approach is to try and reduce the number of high 
frequency vectors at the VQ input. Our approach minimizes the number of vectors with 
abrupt intensity variations by using an an appropriate scan to partition an image into vectors. 

gu 1 (left) we show a 5 x 5 segment taken from the Lena image plotted as a surface, 
with the vertical axis representing the intensity value and horizontal axes representing spatial 
co-ordinates. The segment contains an edge, that is an an abrupt change in intensity, as we 
move from left to right. In F ipl re 1 (right) we show two different vectors that were obtained 
by raster scanning the block. If we scan the block row by row, going from left to right 
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Figure 1: An image block (left) and two different vectors formed by scanning the block 
(right). 


we obtain the vector plotted using a solid line. The vector formed in this manner has 4 
peaks, each representing a transition from the end of one row to the beginning of another. 
Alternatively, if we scan the block column by column, going from top to bottom, we obtain 
the vector shown in the same plot in dotted line. This vector is much smoother than the 
vector obtained using the left to right scan, and is much more likely to have a codebook 
entry ‘close’ to it. We can see that the manner in which vectors are formed from a block 
influences the nature of the resulting vectors, which in turn can influence the fidelity of their 
representation. This argument, can in fact be extended to the entire image. 

Therefore, we would like to find a systematic method for scanning the image which 
would result in vectors that are smooth in some sense, and therefore more likely to find close 
matches. In this paper we address this question and give a novel technique for extracting 
vectors from an image which when quantized give much better results compared to vectors 
obtained by standard techniques. 

The idea of scanning an image in an ‘efficient manner’ in order to improve performance 
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of a compression scheme is certainly not new. In terms of practical image compression 
techniques, work to date has concentrated on scanning the image using fixed scans that 
exploit the inherent two-dimensional relationships that are present in image data. In fact, 
the (perhaps) earliest work investigating alternative scanning techniques was done by Wyner 
with the objective of scrambling video signals in order to protect against eavesdropping 
[18, 19]. Later, Matias and Shamir [10] showed that using pseudo-random space filling 
curves for scrambling actually results in reducing bandwidth required for transmission. 

By far the most popular of such special scanning techniques have been the discrete 
approximations of Hilbert and Peano space filling curves. In fact, Ziv and Lempel have 
shown that the problem of optimally compressing n-dimensional data can be reduced to 
that of optimally compressing a 1-dimensional string by using a discrete approximation of 
a Hilbert space filling curve [9]. However, their optimality result is asymptotic and the 
scheme they propose is not practically applicable to gray scale images. Nevertheless, the 
Hilbert scan has been effectively used to enhance the performance of a variety of image 
compression techniques. In [16] a Hilbert scan is used to rearrange pixels prior to vector 
quantization. An image compression technique based on a wavelet transform of vectors 
extracted by performing Hilbert/Peano like scans is reported in [2]. Yang et. al. [20] use 
Peano scanning along with fractal coding to compress still images. Cole [4] has used Peano 
and Hilbert scans for data compaction of raster graphics. 

In the next section we review some ideas from previous work and briefly introduce the 
notion of scan models. In section three we show how scan models can be used to enhance 
the performance of vector quantization of multi-spectral data sets. We present comparisons 
of the proposed technique with standard techniques and show that the proposed techniques 
compare favorably. Finally, we conclude in section five with a brief summary and pointers 
to future work. 
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:: 2 Scan Models 



In this section we review scan models and related concepts from previous work [14]. We 
consider a digital image P, to be an M x N array of integers such that 0 < P[m, n] < L— 1 for 
0 < m < M and 0 < n < N. The notion of ‘adjacency’ between pixels in an image is often 
based on the 4-neighborhood model or the 8-neighborhood model, the adjacency graphs of 
which, A 4 and A s , are shown in Figure 2. An image P induces a weighting function on 
the edges of an adjacency graph if we assign the weight on an edge to be the difference in 
intensity values of the two pixels corresponding to the vertices incident upon the edge. We 
call the weighted version of an adjacency graph, induced by an image P to be the difference 

graph of P. An image and its difference graph using the 4-neighborhood model are shown 
in Figure 3. 

Given an adjacency graph, we call any spanning tree of the graph, a scan model. A scan 
model specifies an order for traversing the pixels of an image, for the given neighborhood 
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Figure 4: Two prediction trees for the image m Figure 1 

cheme. Standard traverse, schemes like the raster scan and the Hilbert scan [9] are special 
■ases of a scan model. A scan model can also be viewed as a non-causal predrctron mo e or 
m image. For example, the scan mode, on the left in fi^e 4 specifies that the predictron 
for pixel (1, 2 ) should be the intensity value of its neighbor on its left, and the right neig 

of pixel (1, 3) is to be used as a prediction for its value and so on. In a similar manner t e 

. , . .,i Tn this naoer we use the first interpretation, 

prediction scheme for each pixel is specified. In this paper, 

„„„ mnripl as soecifving a traversal of an image, 
that is, we view a scan model as specuymg, 

H we are to use scan models for image processing tasks, then we would be interested in a 
model that is optimal with respect to some objective function that depends upon the specific 
application at hand. In [14] we look at a few objective functions and investigate algont ms 
for constructing optimal models. What is interesting is that given our formulation, e 
problems related to finding good models can be abstracted as graph problems. at is 
problems which involve constructing a spanning tree of the difference graph with desire 

properties. 

A natural objective function to minimize is the sum of absolute weights on the edges 
corresponding to a scan model, which we call a MAW scan model. A MAW s 
the absolute sum of differences between successive pixels in the scan. Computing a MAW 
scan involves finding a minimum weight spanning tree of the difference graph, after t e 
original weights are replaced by their absoiute value. A minimum weight spanning tree o 
a weighted graph can be computed in time O(A«,og,o g (MA0) [31 * our case, since the 
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graph is sparse, a minimum weight spanning tree can be computed in time 0(MN log* MN) 
1 [5], which for all practical purposes amounts to O(MN). 

An MAW scan that was constructed for the USC-Girl image is shown in Figure 5. It 
can be seen that unlike statistical models like context based models and linear models, scan 
models are essentially ‘structural’ in nature. They capture the essential two-dimensional 
structure inherent in an image. Hence, they could be potentially of use in a variety of image 
processing tasks. In previous work we have investigated their application to lossless compres- 
sion of still images [13] and multi-spectral image data [12], and in lossy plus lossless image 

compression [11]. In the rest of this paper we investigate their application to partitioning an 
image into vectors prior to quantization. 

Hog n = mm{i > 0 : log' n < 1}; log* is defined by log 1 n = logn and log i+1 n = log (log* n) 
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Image 

Blocks 

MAW Scan 

USC-Girl 

136 

48 

Couple 

133 

42 

Girl-1 

96 

20 

Girl-2 

216 

38 

House 

330 

67 

Tree 

638 

227 


Table 1: Comparison of variance vectors from blocks and MAW scan 

3 Using an MAW scan model to form vectors 

As we said before, in conventional VQ techniques image blocks that contain an edge result 
in high detail vectors, which usually result in visually annoying degradations in the recon- 
structed image. In this section we apply the notion of a scan model to address the edge 
degradation problem. Our approach minimizes the number of vectors with abrupt intensity 
variations by using an MAW scan to partition an image into vectors. 

An MAW scan by definition minimizes differences between successively scanned pixels. 
Hence vectors formed by taking k successive pixel values along an MAW scan will be highly 
correlated and can be clustered and quantized with lesser distortion. In order to test this 
hypothesis we performed the following experiment on a standard test set of 256 x 256 RGB 
images taken from the USC database. We first partitioned the image into 4x4 blocks and 
computed the variance of each block. The mean of the variance values, rounded to the 
nearest integer is shown in the first column of table 1. We then computed an MAW scan for 
the image and formed vectors of dimension 16 by performing a depth- first traversal of the 
MAW tree. The variance of each vector was computed and the mean value is shown in the 
second column of table 1. It can be seen that vectors obtained from an MAW scan contain 
much less activity than those formed from k x k non-overlapping blocks. 

In order to test our second hypothesis that for a fixed bit rate, vectors formed from 
MAW scans can be quantized with lesser distortion as compared to k x k blocks we per- 
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Image 

Block VQ 

Scan VQ 


SNR 

PSNR 

SNR 

PSNR 

USC-Girl 

21.02 

30.35 

22.79 

32.09 

Couple 

16.33 

31.72 

19.26 

34.61 

Girl-1 

27.93 

32.99 

31.61 

36.63 

Girl-2 

27.93 

32.99 

31.61 

36.63 

House 

26.37 

31.31 

27.50 

32.44 

Tree 

21.21 

26.02 

22.87 

27.69 


Table 2: Comparison of SNR and PSNR for Block VQ and Scan VQ. 

formed another experiment. Here, we first used the Generalized Lloyd’s algorithm (GLA) 
for generating a codebook of size 256 for each of the test set of images. Since the test im- 
ages were color images represented in the RGB domain, we took the green band for the our 
experiments. Vectors were formed by raster scanning 4x4 blocks. Column 2 and 3 of table 

2 show the SNR and PSNR values obtained by encoding each of the images with its local 
codebook. 

We next used an MAW scan to form vectors which were then clustered by the same 
Generalized Lloyd’s algorithm to form a codebook of the same size and dimension. Columns 
4 and 5 of table 2 show r the SNR and PSNR values obtained by encoding each of the images 
by its own local codebook. We see a significant increase in SNR and PSNR obtained when 
vectors formed by using an MAW scan are quantized. We would like to point out that the 
images have each been encoded by using local codebooks, that is a codebook generated from 
the image itself. This would not be done in practice but our intention in this section was 
only to demonstrate the validity of our approach. 

The problem with using scan models for forming vectors prior to quantization is that an 
optimal scan model will vary from image to image. Hence an encoding of the scan has to 
accompany an encoding of the image. Unfortunately, due to the large number of possible 
scans, the cost of encoding a scan is usually more than 1.5 bits per pixel [14]. Hence our 
approach can not be used for single frame images in a straight forward manner. For multi- 
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spectral data sets however, the cost of encoding a scan can be avoided by making use of 
spectral correlations. In the next section we show how this can be done. 

4 Scan Predictive VQ 

Compression is generally achieved by removing inherent redundancies present in data. In the 
case of a multi-spectral data set, there are two sources of redundancy - spatial redundancy and 
spectral redundancy. By spatial redundancy we mean correlations among spatially adjacent 
pixels in the same spectral band. By spectral redundancy we mean correlations among pixels 
that have approximately the same spatial location but are in adjacent spectral bands. While 
spatial correlation is adequately exploited by standard Vector Quantization (VQ) techniques, 
variations of VQ that exploit spectral correlations have started emerging only recently. 

Gupta and Gersho [8] have recently proposed a paradigm for vector quantization of 
multi-spectral data called Feature Predictive Vector Quantization. They point out that the 
conventional approach to vector quantization of multi-spectral data by forming a vector that 
spans spectrally adjacent blocks leads to high complexity with reasonable block sizes. For 
example, if we take a block size of 4 x 4 and form a vector X by concatenating blocks X\ 
and X 2 from two spectrally adjacent bands leads to a vector X = (Xi, X 2 ) of dimension 32. 
A bit rate of 0.5 bits per pixel then requires a codebook of size 65,536. In order to alleviate 
this problem they extract a reduced dimensionality feature vector U from X and transmit 
a quantized version of U from which an estimate X of X is formed by the receiver. In this 
section we present Scan Predictive VQ, a compression technique for multi-spectral data sets 
that is based on the notion of scan models. The scheme retains a manageable complexity in 
terms of vector dimension and at the same time effectively exploits spectral correlations. 

Correlations between spectral bands in a multi-spectral data set are a result of the fact 
that the bands are imaging the same physical structures. Thus while pixel values in neigh- 
boring bands may be very different, the relationships between a pixel and its neighbors may 
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Image 

Blocks 

MAW Scan 

Predictive Scan 

USC-Girl 

136 

48 

73 

Couple 

133 

42 

58 

Girl-1 

96 

20 

24 

Girl-2 

216 

38 

69 

House 

330 

67 

198 

Tree 

638 

227 

364 


Table 3: Comparison of variance vectors from blocks, MAW scan and Predictive scan 

be very similar in adjoining spectral bands. This relationship information is captured well in 
by a scan model. In fact, experiments have shown that an MAW scan of one band effectively 
models the image in a spectrally adjacent band [12]. The third column of table 3 shows 
the mean variance for vectors formed by using the MAW scan of the red band on the green 
band of the test images. 2 The first two columns are the same as table 1. It can be seen that 
although the variance is not as low as that obtained by using an optimal scan, using the 
optimal scan of the previous band to extract vectors does lead to highly correlated vectors 
as compared to using non-overlapping blocks. Similar results were obtained on the other 
bands. 

Hence, given a multi-spectral image, the first image in the sequence can be compressed 
and transmitted by any conventional method and subsequent to that we can use the optimal 
model of the (( k — l) t/l ) previous image on the ( k th ) current image in the sequence in order 
to form vectors of the required dimension. These vectors can then be quantized by any 
of the vector quantization techniques described in the literature. This approach gives us 
a simple and efficient backward adaptive technique that exploits both spatial and spectral 
correlations. By backward adaptive we mean an adaptive technique in which both the 
transmitter and receiver are in possession of the information necessary for adaptation. This 
happens when the output of the transmitter (which is also available to the receiver), is used 

2 A color image in the RGB domain is essentially a multi-spectral image formed by three sensors responding 
to a narrow band of wavelengths centered around 700 nm (red), 546.1 nm (green) and 435.8 nm (blue) 
respectively. 


11 



for future adaptation. This has the advantage of obviating any necessity for transmission 
of additional or ‘side’ information. However, as the current information can only be used 
for future adaptations, there is necessarily a delay in the adaptation process. We call this 
technique Scan Predictive Vector Quantization (SPVQ). Note that there is no cost incurred 
in encoding the scan, since it is being constructed from the previous image. 

The codebook for Scan Predictive VQ is designed by using a GLA-like algorithm. The 
design can be done using either an open-loop or closed-loop approach [7]. In the open-loop 
approach, vectors from the current band, k are extracted by using the MAW scan of the 
original band k — 1 image. These vectors are then clustered by the generalized Lloyd’s 
algorithm to obtain a codebook of the desired size. In practice since the original image is 
not available to the receiver, reordering with an MAW scan is only possible with respect to 
the reconstructed image of the previous band. Hence the codebook is not optimal for the 
actual data being used. However, if the resulting reconstructed image is of sufficiently high 
quality, then it would be very close to the original and the codebook should give close to 
optimal quality. 

The codebook can also be constructed by using a closed-loop approach. In such an 
approach the codebook vectors from the current band are obtained by using an MAW scan 
of the reconstructed image in the previous band. We can see that in the closed-loop design 
process, the training sequence of vectors changes with every iteration and hence convergence 
to a local minimum is not guaranteed. However, it has been observed in practice that the 
closed-loop technique gives substantial improvement over the open-loop technique [6]. Our 
experience has been consistent with this observation and hence in the rest of this paper we 
report results only for the closed-loop design technique. 

In table 4, we give results obtained from two test images, California and Moffet, shown 
in figure 6 3 for different types of VQ. In each case, the codebook was generated by using the 

Thematic Mapper simulator data, acquired on NASA C-130 aircraft. Original data processed by P.-S. 
Yeh at NASA GSFC to approximate Landsat7 HRMSI resolution. The California image was taken over the 
San Luis Reservoir and has a ground resolution of 5.7m. The Moffet image has a ground resolution of 4.8m 
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Band 

Block VQ 

Scan Predictive VQ 


SNR 

PSNR 

SNR 

PSNR 

1 

22.80 

27.75 

- 

- 

2 

24.24 

30.58 

25.66 

31.90 

3 

22.90 

29.41 

23.44 

29.95 

4 

23.94 

29.99 

24.37 

30.42 

5 

24.50 

27.92 

25.21 

28.63 

6 

24.43 

30.49 

25.78 

31.84 

7 

21.57 

29.73 

23.47 

31.63 

8 

22.08 

25.16 

25.49 

28.57 

Average 23.38 

29.02 

24.77 

30.42 


Table 4: Comparison of SNR and PSNR for California Image. 

Moffet image and encoding results are presented for the California image. The Moffet image 
was 350 x 512 and the California image 232 x 512. Both images had 8 spectral bands. 

First we used the Generalized Lloyd’s algorithm to construct a codebook of size 1024 
for vectors of size 16 that were formed by raster scanning 4x4 blocks of the Moffet image. 
The California image was then encoded by using full search VQ. Columns 2 and 3 give the 
SNR and PSNR values obtained. Experiments were also performed by forming a vector that 
spanned across two adjacent 2x4 blocks. This, however did not lead to any improvements in 
SNR and PSNR values and often resulted in poorer performance. This leads us to conclude 
that forming vectors by scanning two spectrally adjacent image blocks is not an effective 
technique for capturing spectral correlations. 

Next we designed a codebook of size 1024 from 16 dimensional vectors obtained from 
a depth-first traversal along an MAW scan of the previous band for bands 2 to 8 of the 
Moffet image, by using the closed-loop technique presented above. Bands 2 to 8 of the 
California image were then encoded by using this codebook and quantizing vectors obtained 
by a depth-first traversal of the reconstructed image of the previous band. The first band 
of the California image was encoded by using the JPEG standard [17). The particular 
implementation of JPEG used in this work was a public domain implementation provided 
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Image 

Bpp for 
Red 

Block VQ 
Green Blue 

SPVQ 

Green Blue 

USC-Girl 

0.79 

28.4 

28.0 

30.2 

31.8 

Couple 

0.78 

27.1 

27.4 

31.8 

31.2 

Girl 1 

0.46 

31.9 

31.9 

35.1 

34.6 

Girl 2 

0.73 

29.5 

30.8 

31.6 

32.2 

Average 

0.69 

29.2 

29.5 

32.2 

32.5 


Table 5: Comparison of PSNR Values for test images. 


by the Independent JPEG Group. This implementation provides an input parameter Q, that 
controls the quality and bit rate of the compressed image. A value of 50 was used for Q for 
a bit rate of 1.20 bits per pixel. This gives us an average rate of 0.69 bits per pixel for the 
entire image. Columns 4 and 5 show SNR and PSNR obtained. We see that an improvement 
of more than 1.5 db is obtained on an average. In figure 7 we show the reconstructed band 
5 of the California obtained by using conventional VQ and also the one obtained by using 
the closed-loop technique. We see that the edge artifacts in the image obtained by SPVQ 
are considerably reduced. Also, note from table 4 that band 5 is where the smallest gain in 
SNR/PSNR is obtained by SPVQ as compared to the other bands. 

We also repeated our experiments for the RGB images listed in table 1. Here, we con- 
structed codebook of various sizes from four images using the closed-loop design technique. 
A different set of images was then Vector Quantized with this codebook. In table 5 we show 
the PSNR values obtained with a codebook size of 4096 and vector dimension 16 for a few 
images none of which were a part of the training set. Also, the first band (red band) for both 
the images was encoded using the JPEG standard with Q = 50. The resulting bit rate for 
the red band is shown in column 1 of table 5. The PSNR values for the green and blue band 
are shown in columns 3 to 6. We see an improvement of 3 db can be obtained by using an 
appropriate scan to form vectors. The reconstructed USC-Girl image obtained from SPVQ 
and block VQ are shown in figure 5. In the image obtained by block VQ we clearly see the 
staircase effect, especially near the shoulders. The image obtained by SPVQ on the other 
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hand has no such artifacts. The original image appears in figure 5. 

At this point we would like to make a couple of points. First, note that the first band 
in the sequence has to be encoded by a conventional coding scheme. Hence, the quality of 
the first image can potentially effect the performance of scan predictive VQ on the second 
band and the quality of reconstructed image for the second band influences the quality of 
the third band etc. Our experiments seem to indicate that results obtained are robust with 
respect to the quality of the first image in the sequence. As long as the first image is of 
reasonable quality, the quality of subsequent images, on an average, remain unaffected. Here 
by reasonable quality, we mean for example, a Q factor of 50 or greater when using the 
JPEG standard. 

Second, we would also like to note that better PSNR values at comparable bit rates have 
been reported in the literature with sophisticated enhancements to the basic VQ technique. 
However, all such enhancements can easily be incorporated into the scheme presented here. 
We have deliberately used simple codebook generation, organization and search techniques 
so that a proper estimate of the gains made by alternative techniques of forming vectors can 
be obtained. 


5 Conclusions and Future Work 

We have seen that scan models can be used to develop a simple and effective solution to 
the problem of edge degradation encountered during vector quantization of a sequence of 
correlated images, like multi-spectral images, 3-D medical images or a video sequence. An 
MAW scan by definition minimizes differences between successively scanned pixels. If we 
have a sequence of correlated images, then our experiments have shown that an MAW scan 
of one image in the sequence effectively models the next image in the sequence. Using an 
MAW scan of the previous image to extract vectors from the current image and quantizing 
these vectors by usual vector quantization techniques leads to significant improvements in 
performance over conventional block VQ techniques. Besides simple VQ, in future work we 
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Figure 6: Original California (left) and Moffet (right) images, band 5 








Figure 7: California image (band 5) at 0.75 bpp after Block VQ (left) and SPVQ (right) 
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Figure 8: Girl-1 image (green band) at 0.75 bpp after Block VQ (left) and SPVQ (right). 
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ABSTRACT 

n this paper, we address the problem of accessing portions of 
rnumband data which has been losslessly compressed. An approach 
that uses the fractal property of some well known space filling curves 
to _ ovide access to portions of a losslessly compressed data set is 
de^Jbed. This approach reduces the average amount of 
decompression necessary to access any portion of the data set, 
thereby reducing the amount of time required to access the 
confessed data. Various tradeoffs exist which will be discussed 
wiftrpractical examples. 

V INTRODUCTION 

NASA's Mission to Planet Earth * wall result in an enormous 
inc ase in the amount of data that will need to be archived. This has 
lea to an increased interest in not only more efficient lossless 
compression techniques, be also in faster methods of accessing 
losslessly compressed data. In particular, accessing portions of large 
mil band data sets. 

w rhere are several efficient lossless compression techniques 
currently available. For instance, predictive coding schemes use 
pre ous data points to generate a prediction for the current data 
va^.L The prediction error is then losslessly coded. Context based 
algorithms use the neighboring data values to determine the best 
enc .ding scheme. Dictionary schemes build a library of previously 
encountered patterns which can be used to encode patterns yet to be 
encountered. All of these techniques have one thing in common. 
They use information from previous data to encode future data 
vat is. Therefore, it is necessary to start decoding at the beginning of 
thdL-ata set. For example, if one needed to access a 128x128 section 
of band 7, of a 512x512, 7 band data set, it may be necessary to 
de__ de the entire data set. 

In this paper, we present an algorithm which limits the amount of 
decoding necessary to access any portion of a compressed data set. 
Thereby, providing the user with convenient access to the data. 

^ ENCODING ALGORITHM 

The goal of this approach is to reduce the amount of past 
incarnation needed to access any particular data value in the set. This 
will in turn reduce the amount of decoding necessary to retrieve any 
person of the data set. This was carried out by partitioning the data 
seUito smaller subsets and then losslessly encoding the subsets. Two 
different methods of scanning the data subsets were used. 


* This work was supported in part by the NASA Goddard Space 
F] 7o ht Center under grany NAG5-1612. 


Partitioning the Data Set 

The data set is partitioned into three dimensional subsets. For 
instance, a 512x512, 7 band data set maybe partitioned into 128x128, 
1 band subsets. Figure 1 illustrates this partitioning. 



Figure 1. Example of partitioned data set. 

The first data value in each subset is encoded using full 
resolution. The subsequent data values can then be encoded using 
any lossless compression technique. For this particular work we have 
used a simple predictive coding approach in which the difference 
between neighboring data values are Huffman coded. 

The location of the first data value in the compressed file is kept 
in a code book. This allows the user to open the compressed file and 
advance the file pointer to the beginning of any particular subset and 
begin decoding at that point instead of starting at the beginning of the 
file. 

The encoding algorithm was tested on a Landsat - TM 512x512, 
7 band data set. The results for various partitions are given in Table 
1 . As shown, the compressed file size and the additional code book 
requirements increased as the size of the partitioned subsets 
decreased. This indicates a trade off between storage requirements 
and the ability to access small portions of data quickly. 

Scanning the Subsets 

Two scanning patterns were used in testing the encoding 
algorithm. First, each band in each subset was scanned sequentially 
using a simple raster scan. Second, the Hilbert scanning pattern was 
used to scan each band in the subset sequentially. 
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Table 1 . Test results for proposed encoder algorithm using various subset sizes. 


^ Data Subset Description 

Size of Data Subset 
(bytes) 

Compressed 
File Size 
(bytes) 

Additional Code Book 
Requirements 
(bytes) 

512 rows, 512 columns, 7 bands 

1835008 

971395 

0 

12 rows, 512 columns, 3 bands 

786432 

971895 

s 

^rl2 rows, 512 columns, 2 bands 

524288 

971895 

12 

512 rows, 512 columns, 1 band 

262144 

971895 

24 

56 rows, 256 columns, 3 bands 

196608 

974040 

44 

'—5 6 rows, 256 columns, 2 bands 

131072 

974040 

60 

256 rows, 256 columns, 1 band 

65536 

974040 

108 

28 rows, 128 columns, 3 bands 

49152 

978028 

188 

_28 rows, 128 columns, 2 bands 

32768 

978028 

252 

128 rows, 128 columns, 1 band 

16384 

978028 

444 

l rows, 64 columns. 3 bands 

12288 

985164 

764 

\ rows, 64 columns, 2 bands 

8192 

985164 

1020 

64 rows, 64 columns, 1 band 

4096 

985164 

1788 


, he raster scan, shown in Figure 2a, had the advantage of 
jeing" easily implemented on any size subset. On the other hand, 
rhe_ Hilbert scan, shown in Figure 2b, had the advantage of 
d!o^ ng fast access to smaller portions of data at the cost of 
ncr—sed complexity and constraints on the subset size. 


that each rotation of the scanning pattern doubled the sub block 
size. Therefore, it is necessary to limit the subset size to a power of 
two. 

DECODING ALGORITHM 


El 


6-QhD-<>-O-^k>-0 

6hGmCK>-0--0--Cm5 » 

6-o-o-o-o-o-q-o 

6-<>-<>-CmD--0-hC^ 

6<>-<M><>hD-<>-0 

(a) 



m (b) 

^ure 2. Raster scanning pattern (a) and the Hilbert scanning 
pattern (b) for an 8x8 block. 


. Jhe Hilbert scan pattern for a 4x4 block is shown in Figure 2b. 
hri-pattem is then rotated by 90 and 180 degrees to form the 8x8 
ock shown in Figure 2b. The 8x8 block is then rotated 270 and 
h jegrees to form a 16x16 block. This process is repeated until 
1^7 the data points in each band have been scanned. Thereby, 
ibdiyiding the data points into even smaller sub blocks. Notice 
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The decoding algorithm was set up to allow the user to retrieve 
any desired data subset by specifying the coordinates of the first 
data value, number of rows, columns and bands in the desired 
subset. The algorithm then searched the code book for the start of 
the encoded subsets which contained the requested data. The 
algorithm did not place any restrictions on the size of the desired 
data subset. 

The decoding algorithm was tested on a Landsat - TM, 
512x512, 7 band data set after it had been compressed into subsets 
of 128x128, I band using both the raster and Hilbert scanning 
patterns. Requests for data were assumed to be uniformly 
distributed over the entire data set. 

A set of uniformly distributed request were generated using a 
random number generator to select the coordinates of the first data 
value. For simplicity, the size of the requested set was held 
constant at 256x256, 1 band. The number of data points decoded 
to retrieve the requested sets were calculated . The results of this 
test are given in Table 2. As shown, both scanning patterns 
produced approximately the same results. Therefore, if the 
assumption that the requests for data would be uniformly 
distributed over the entire data set was correct, the choice of 
scanning pattern would make very little difference. 


Table 2. Results of uniformly distributed requests 


Scanning 

Method 

Average Number of 
Data Points Decoded 

Hilbert scan 

r 123100 

Raster scan 

122522 


However, if the request for data are assumed to be centered 
around some point of interest in the data set, the assumption of a 
uniform distribution is incorrect and the choice of scanning patterns 
may be important. 






















• A more accurate model of the requests may be a nonrial 
diSrribution around the point of interest. A 'smart 1 encoding 
algorithm was developed to test this theory. 

w 'SMART 1 ENCODING ALGORITHM 

The 'smart' encoding algorithm starts by encoding the data set 
die method just described with one added feature. Each 
partitioned subset is subdivided into sub blocks and as requests for 
the data are processed, a frequency count of the number of times 
ea: ; sub block is accessed is kept. 

''—After a sufficient number of requests have been processed, the 
data set is re-encoded and each subset is scanned with the pattern 
wi_ :h provides the most efficient access based on past requests. A 
number which identifies the scanning pattern is encoded in the 
"compressed file at the beginning of each subset followed by the first 
da!\ value in the subset. Four different sets of ’smart 1 scanning 
pal jms were developed. 

^Tn order to simulate past requests, a set of 1000 blocks whose 
first pixel location was normally distributed around the point 
1Z ;128 with a variance of 12S was generated. For simplicity, the 
re^sted blocksize was held constant at 256x256, 1 band. 

Once the data set had been re-encoded, each of the four 'smart 1 
sc .mng patterns were tested by calculating the number of data 
perils that needed to be decoded to retrieve a test set of normally 
distributed requests. The test requests were normally distributed 
around the point 128x128 with a variance of 256. The results of 
thr bur ‘smart 1 scanning patterns are given in Table 3. 

—Notice that there is an increase in storage requirements of 
approximately 16 bytes per subset. This is due to the need to store 
th, requency counts. 

Pattern #1 

The first set of ’smart’ scanning patterns, shown in Figure 3, 
us- the Hilbert scan with the processing order of the first sub 
bI<3Cks determined by the frequency count. There were a possibility 
of eight different patterns, therefore, three bits where used at the 
be ruling of each subset to identify which pattern was being used. 

_The results shown in Table 3, indicate that approximately 20 
thousand fewer data points were decoded using this method when 
co. pared to the original Hilbert or raster scan. 

PaTTern #2 

The second set of ’smart' scanning patterns, shown in Figure 4, 
tisti the raster scan with the processing order of the rows and 
cowruis determined by the frequency count. There were a 
possibility of only four different patterns, therefore, only two bits 
wt ;e used at the beginning of each subset to identify which pattern 
wr used. 

The results shown in Table 3, indicate that these scanning 
pa^ems provided a savings of approximately 12 thousand data 
pc_ts over the scanning patterns used in the first set. 

Pattern #3 

^Frequencies for the comer second sub blocks where calculated 
fo_he third set of ’smart’ scanning patterns. The scanning patterns 
used the Hilbert scan with the processing order of the second sub 
blrrks determined by the frequency count. There were 16 possible 
sc'_ning patterns, the eight used in the first set and the eight 
additional patterns shown in Figure 5. Therefore, it was necessary 


to use four bits at the beginning of each subset to identity which 
pattern was used to encode the subset. 

The results shown in Table 3 indicate only a slight 
improvement over the patterns used in the previous method. 



Figure 3. Hilbert scan with procession order of first sub blocks 
determined by frequency count. 

Pattern #4 

A close examination of the previous three methods revealed 
that certain types of requested where handled more efficiently by 
the Hilbert scan and others by the raster scan, as shown in Figure 7. 
Therefore, the fourth set of ’smart’ scanning patterns used a 
combination of the two patterns. The scanning patterns used 
consisted of the four patterns shown in Figure 4 and four new 
patterns which started at each comer and scanned the subset as 
shown in Figure 6. There was a total of eight different patterns, 
therefore, only three bits were used at the beginning of each subset 
to identify which pattern had been used to encode the subset. 

The results shown in Table 3 indicate that this set of scanning 
patterns produced a significant decrease in the number of data 
points decoded to retrieve the requested data sets. 
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w CONCLUSION 

/Vn approach to decompressing portions of losslessly 
cc pressed multi band data was presented. The proposed 
approach first partitioned the data set into three dimensional 
subsets and encoded the first data value with full resolution. The 
lo — tion of the first data value in the compressed file was then 
aOd to the code book. Therefore, any subset could be accessed 
by opening the compressed file and advancing the file pointer to the 
lo~ ~tion of the first data value and begin decoding there opposed to 
st; :ing at the beginning of the compressed file. 


The algorithm was first tested by assuming that requests for 
the data would be uniformly distributed over the entire data set. 
Both the Hilbert and raster scanning patterns were used to encode 
the data set. Both scanning patterns produced approximately the 
same results. 

The request were then assumed to be normally distributed 
around a particular point of interest in the data set and a 'smart* 
algorithm was developed to select the scanning pattern which 
provided the most efficient access to the data, based on previous 
requests. This 'smart' algorithm used properties of both the Hilbert 
and raster scanning patterns and provided significantly better 
results. 


Table 3. Results of ’smart* scanning patterns 


j Scanning 

Method 

Average Number of 
Data Points Decoded 

Additional Storage Requirements 
per Partitioned Subset 

Original Hilbert scan. 

122345 

4 Bytes 

~"j Original raster scan. 

123020 

4Bytes 

LJ Hilbert scan with processing order of first 
*■-! sub blocks determined by frequency count. 

101012 

20 Bytes, 3 Bits 

| Raster scan with procession order of rows 
: „ and columns determined by frequency count. 

89400 

— / 

20 Bytes, 2 Bits 

u- Hilbert scan with processing order of second 
I sub blocks determined by frequency count. 

88069 

J. 1 

20 Bytes, 4 Bits 

i ) Combination of Hilbert and raster scans with 
i processing order of sub blocks, rows and 
columns determined by frequency count. 

70291 

20 Bytes, 3 Bits 
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T^jreS. Hilbert scan with the processing order of the second sub 
- jf§ blocks determined by the frequency count. 


Hilbert lower left. Hilbert lower right. 

Figure 7. Type of request in subsets and the scanning pattern which 
provides the best results. 





Figure 6. Example of Hilbert scan with 4x4 building blocks. 
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Compression of Color-Mapped Images 

Andrew C. Hadenfeldt, Member, IEEE , and Khalid Sayood, Member , IEEE 



Abstract — Multispectral data is often displayed and stored as 
a color-mapped or pseudo-color image. Pseudo-color is also 
used to enhance features in a single-band image. The use of 
pseudo-color tends to rearrange the structure in the image in 
such a way as to prevent efficient compression. This structure 
can be restored by sorting the color maps. Restoration of the 
structure increases the efficiency of lossless compression and 
permits the use of lossy compression algorithms. The latter 
benefit is especially useful for many progressive transmission 
algorithms. 


I. Introduction 

M ULTISPECTRAL remotely sensed data are often 
displayed as composite images on color monitors. 
These composite images are generated by treating three 
spectral bands of the multispectral dataset as the red, 
green, and blue planes of an RGB image. If the pixels in 
each plane are represented by 8 b, each pixel in the com- 
posite image is represented by 24 b, allowing a total of 
2 24 colors to be displayed. More expensive systems may 
use more than 24 b/pixel. A disadvantage of the full-color 
display is the large amount of memory required to repre- 
sent an image. This memory must be quickly accessible 
to allow real-time updating of the CRT, making full-color 
image displays costly. Also, the images involved require 
large amounts of storage space, whether in display mem- 
ory or on a mass-storage device. A less expensive solution 
is needed. 

Many commercial image processing and geographic in- 
formation systems (GIS) use a pseudo-color or color- 
mapped frame buffer. The values stored in memory are 
used as indexes into a 24-b table, the color map. Each 
entry in the color map consists of 8-b values for the red, 
green, and blue portions of the pixel. The color-mapped 
system allows the display of a small number of colors at 
a time, 2 8 for the system shown in the figure, which can 
be selected from a larger set of colors (2 24 for this exam- 
ple). By careful selection of the colors in the color map, 
a large variety of images can be displayed, often with 
quality approaching that of a full-color display system. 
The color map is obtained through a quantization process, 
the goal of which is to select the most “representative” 
256 colors from the available colors. The color-map gen- 

Manuscript received January 3, 1994. This work was supported in part 
by the NASA Goddard Space Flight Center under Grant NAG 5-1612 and 
by the NASA Lewis Research Center under Grant NAG 3-806. 

A. C. Hadenfeldt is with the University of Nebraska Medical Center, 
Omaha, NE 68198. 

K. Sayood is with the Department of Electrical Engineering, University 
of Nebraska-Lincoin, Lincoln, NE 68588. 

IEEE Log Number 9402214. 


eration algorithms do not attempt to put the color map 
entries in any particular order. Unfortunately, this lack of 
order makes compression more difficult. 

Perhaps the most useful trait of image data used in im- 
age compression is the pixel-to-pixel correlation. For an 
achromatic image, this means that the integer pixel values 
(which describe the intensities of the pixels) will be nu- 
merically similar for spatially adjacent pixels. For a full- 
color image, a similar condition exists for adjacent pixels 
on individual color planes. In a color-mapped image, the 
values stored in the pixel array are no longer directly re- 
lated to the pixel intensity (or the magnitude of one of the 
color components). Two color indexes that are numeri- 
cally adjacent (close) may point to two very different 
colors. Hence, the correlation between pixels appears to 
be lost. This makes compression of these images very dif- 
ficult. With the arrival of instruments that generate im- 
ages with higher spatial resolution, and significantly more 
spectral bands, compression of these images has become 
an important problem. 

For most compression algorithms to work there has to 
be some correlation structure in the data. The structure in 
the color-mapped image still exists, but only via the color 
map. Therefore for compression, this structure has to be 
reintroduced into the image. We show that the reintro- 
duction of structure can be accomplished by sorting the 
color map. In this paper we study the sorting of color maps 
and show how the resulting structure can be used in both 
the lossless and lossy compression of images. 

The sorting procedure is described in the following sec- 
tion and the lossless and lossy compression results are 
presented in Sections III and IV, respectively. 

II. Color-Map Sorting 

Sorting the color map can be done to satisfy one of two 
possible goals. The first is the desire to restore the cor- 
relation among the pixels to allow them to be efficiently 
coded, i.e., a reduction in the differential entropy. The 
second goal is to allow the introduction of small errors in 
the color index values, such as those resulting from quan- 
tization, without a large reduction in the subjective image 
quality. Even in this latter case, the desire for entropy 
reduction is implied since that is the purpose of quanti- 
zation. These two goals conflict somewhat since the sen- 
sitivity of the eye to color errors is dependent on many 
things. It will be useful to find a solution that satisfies 
both of these goals to some degree. 

Color-map sorting is a combinatorial optimization 
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pi Mem. Treating the K color-map entries as vectors, the 
puJMem is defined as follows. Given a set of vectors {a t , 
a,, ' ‘ ‘ n a three-dimensional vector space and a 

d' ance measure d(i, j) defined between any two vectors 
a. nd cij, find an ordering function L(k) that minimizes 
the total distance D: 

K - i 

D = Z d[L(k), L(k + 1)]. (1) 

k = 1 

T>e ordering function L is constrained to be a permutation 
o the sequence of integers {1, ■ * ■ , K). Another pos- 
sibility results when the list of color-map entries is con- 
sidered as a ring structure. That is, the color-map entry 
S' ; cified by L(K) is now considered to be adjacent to the 
e-wjy specified by L(l). In this case, an additional term of 
d[L(K) f 1(1)] is added to the distance formula D. 

.;7he sorting problem is similar to the well-known trav- 
e; ig salesman problem, and is identical if the color map 
is considered as a ring structure. As such, the problem is 
k n own to be NP - complete [1], and the number of possible 
o: erings to consider is \/2[(K — 1)!] [6]. Algorithms 
exist that can solve the problem exactly [2], [6]; however, 
these algorithms are computationally feasible only for K 
iT^greater than about 20. Efficient algorithms for locating 
S_ocal minimum exist [6] for K < 145. For large color 
maps such as K = 256, another approach is necessary. 
T 'O techniques were tested. The first is a “greedy” tech- 
r |ue, discussed in Section II-A. The second involves an 
algorithm that has performed well in practice, a technique 
k^own as simulated annealing. Simulated annealing was 
c :)sen as the sorting method for the color maps in this 
ptrper and is described in more detail in Section II-B. 

To complete the problem definition above, the distance 
f ;tric d must be determined. There are several possibil- 
Qs depending on the color space used. For the present 
paper, the distance metric was chosen to be an (un- 
sighted) Euclidean -distance, and different color spaces 
t investigated. Three color spaces were selected: the 
M'SC RGB space, the CIE L*a*b* space, and the CIE 
(*u*v* space. The NTSC RGB space was chosen since 
L ’corresponds to the primary colors of the original im- 
ages. Color spaces that can be linearly transformed to the 
NTSC RGB space were not considered, since the use of 
r unweighted Euclidean distance measure would give 
^jiilar results for such a color space. The two CIE color 
spaces were selected since they provide a means to mea- 
-Te perceptual color differences. 

A. Greedy Sorting Algorithm 

^The first color-map sorting algorithm investigated was 
5 greedy algorithm. As the name implies, this is simply a 
^take what you can get” approach. The algorithm begins 
K y selecting a starting node (i.e., a color vector). From 
; tis node, proceed to a node that has not yet been visited 
Vy selecting the path with the least cost. For the color- 
map sorting problem, the “cost” is the distance between 
lie colors, the function d(i, j) defined in (1). The algo- 


rithm proceeds until all nodes (colors) have been visited. 
To avoid any penalties due to the choice of the starting 
node, a path was formed starting at each of the 256 nodes. 
From the 256 paths, the path of least cost was then se- 
lected. 

Tests using the greedy sorting algorithm indicated that 
it was not very successful in sorting the color maps. The 
resultant images were still quite sensitive to small errors 
in the color indexes, and had high differential entropies. 
The simulated annealing algorithm in the next section was 
able to provide better results in both categories. 


B. Sorting Using Simulated Annealing 

Simulated annealing [1], [7] is a stochastic technique 
for combinatorial minimization. The basis for the tech- 
nique comes from thermodynamics and observations con- 
cerning the properties of materials as they are cooled. The 
technique described in this section is based on the imple- 
mentation in [7]. 

In the traveling salesman problem, the goal is to visit 
each city only once and return to the original city with a 
minimum path cost. Similarly, solving the color-map 
sorting problem involves selecting each color only once 
while minimizing the sum of the distances between the 
colors. To find a solution using simulated annealing, an 
initial path through the nodes (cities, colors) is chosen, 
and its cost computed. The algorithm then proceeds as 
follows: 

1) Select an initial temperature T and a cooling factor 
a. 

2) Choose a temporary new path by perturbing the 
current path (see below), and compute the change in path 
cost AE = £ new - £ 0 id- If AE < 0, accept the new path. 

3) If AE > 0, randomly decide whether or not to 
accept the path. Generate a random number r from a uni- 
form distribution in the range [0, 1), and accept the new 
path if r < exp ( — A E/T). 

4) Continue to perturb the path at the current tem- 
perature for I iterations. Then, “cool” the system by the 
cooling factor 7 new = a7 old . Continue iterating using the 
new temperature. 

5) Terminate the algorithm when no path changes are 
accepted at a particular temperature. 

The decision-making process is known as the Metrop- 
olis algorithm. Note that the decision process will allow 
some changes to the path that increase its cost. This makes 
it possible for the simulated annealing method to avoid 
easily being trapped in a local minimum of the cost func- 
tion. Hence, the algorithm is less sensitive to the initial 
path choice. Aarts and Korst [1] show that if certain con- 
ditions are satisfied, the simulated annealing technique can 
asymptotically converge to a global optimum. Even in 
cases where it does not converge to the optimum, the 
method often provides high-quality solutions. 

In the above description, initial values for 7, a, and / 
need to be selected. Selection of these values requires 
some experimentation, although a few guidelines are pro- 
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vided in the references. In this work, initial values of T 
ranged from 80 to 500, depending on the color space used. 

The cooling factor ot was usually chosen as 0.9. The sim- 
ulated annealing algorithm seemed to be most sensitive to 
the choice of this value, as values outside the range (0.85, 
0.95) caused the cooling to occur too slowly or too 
quickly. The number of iterations per temperature I was 
chosen as 100 times the number of nodes (colors), or 
25 600. However, to improve the execution speed of the 
algorithm an improvement suggested Press [7] was added, 
which causes the algorithm to proceed to the next tem- 
perature if (10) (number of nodes) = 2560 successful path 
changes are made at a given temperature. 

Also, a method for perturbing the path must be se- 
lected. In this work, the perturbations were made using 
the suggestions of Lin [6], [7]. At each iteration, one of 
two possible changes to the path are made, chosen at ran- 
dom. The first is a path transport, which removes a seg- 
ment of the current path and reinserts it at another point 
in the path. If we think of the color map as an array, this 
corresponds to moving a segment of the array to a differ- 
ent location in the array. The “hole” left in the array is 
filled up by sliding the components of the array down or 
up (depending on the new insert location). The location 
of the segment, its length, and the new insertion point are 
chosen at random. The second perturbation method, called 
path reversal , removes a segment of the current path and 
reinserts it at the same point in the path, but with the nodes 
in reverse order. The location and length of the segment 
are again, randomly chosen. 

The algorithm outlined in the previous paragraphs for- 
mulates color-map sorting as a traveling salesman prob- 
lem. This type of problem usually assumes a complete 
tour will be made (i.e., the salesman desires to return to 
the original city). Hence, the color map is assumed to 
have a ring-like structure. However, the simulated an- 
nealing technique can also be used if this is not the case, 
allowing the color map to be considered as a linear list 
structure. Experiments using both structures were con- 
ducted. 

III. Color-Map Sorting and Lossless Compression 
A measure that is used to describe the statistical prop- 
erties of an image is its entropy. Entropy provides a mea- 
sure of the randomness of a source, based on an assumed 
model for that source. It also provides an estimate of the 
number of bits per sample required to code the source. 
Treating the image as a memoryless source with an al- 
phabet S containing R symbols, the zeroth-order entropy 
H q is defined as 

R 

H 0 = - E P(S,-) log 2 P(S,) bits (2) 

; = t 

where P(S,) is the probability of occurrence of symbol 5,. 
If there is a correlation between adjacent pixels, another 
possibility is to consider a first-order model for the image. 
If the image is transmitted as a one-dimensional source in 


TABLE I 

Entropies or the Source Images 


Image 

//o 

fii 

Lena 

7.617 

, 7.413 

Park 

7.470 

7.797 

Omaha 

7.242 

7.165 

Lincoln 

5.916 

6.674 


a row-by-row (or column-by-column) manner, a first-or- 
der differential entropy H x can be defined on an alphabet 
D consisting of the 2 7? - 1 possible differences between 
the elements of alphabet S'. 

2R- 1 

H, = - S P(Dj ) log, P(Dj) bits. (3) 
j= 1 

These quantities were computed using the index arrays for 
four test images and are listed in Table I. The Lincoln and 
Omaha images were constructed from channels 2,3, and 
4 of a thematic mapper simulator (TMS) image and are 
shown in Fig. 1. The Lena and Park images are well- 
known standard images. 

The color maps for these images were sorted using sim- 
ulated annealing in the RGB and L*u*v* space. The effect 
of the sorting on the color map is shown in Fig. 2. Fig. 
2(a) displays the colors of the 256 indexes of the color 
map for the Omaha image before sorting, while the colors 
of the 256 indexes after sorting are shown in Fig. -( )■ 
The effect of the sorting has been to make neighboring 
index values correspond to colors that are also close in a 
perceptual sense. 

The numerical effect of sorting the color maps of the 
test images using simulated annealing are shown in Table 
II and fable III. Results for sorting the color map as a 
circular ring structure are shown in Table II, while the 
results of sorting the color map as a linear structure are 
shown in Table III. Values are given in the tables for the 
resulting first-order entropy and the final path cost (the 
distance measure D). 

Note that the zeroth-order entropy H 0 is not changed by 
the sorting process, since permuting the color-map entries 
does not change the frequency of the occurrence of a par- 
ticular color. The lower first-order entropies of the re- 
sultant images indicate that some of the spatial correlation 
between color indexes has been restored in each case. The 
sortin'* results for the NTSC RGB space show that sorting 
in this°space yields good results, if entropy reduction (the 
first °oal stated above) is the goal. However, the L*u*v 
space sorting gives better results, with the added advan- 
tage that the perceptual differences between color-map en- 
tries have been considered. Hence, the resultant images 
from this sort should also be able to accept quantization 
errors while maintaining good subjective quality, the sec- 
ond goal stated previously. We examine this further in the 
next°section. Comparing the results shown in Tables I- 
III, we can see that the sorting of the color map has re- 
sulted in a drop in entropy of 2 b/pixel for the Lena image 
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Fis. 2. fa) Color map for Omaha image before sorting, ffco Color map for 
Omaha image after sorting. 


TABLE 111 

Rksi ltxnt N \ges with Linearly Sortfd Col^r NUps 


L*a*b Space 


L*u*V 

Space 

LZW (GIF) 

Image 

RGB S; 

pace 

L'a’b Space | 

L*u*v* : 

Space 

Cost 

fix 


Name 

Cost 

Jh 

Cost 

fix 

Cost 

Hi 

20S.49 

5.4S0 

6.43 

Lena 

11.68 

5 575 

547 29 

5.933 

200.31 

5.512 

3 10.41 

6,2 i S 

6.88 

Park 

15 66 

6 26)0 

150*3 25 

6.775 

292.29 

6.546 

363 21 

6.178 

7.03 

Omaha 

10.81 

6 532 

1004 63 

6 554 

253.66 

6.199 

224.06 

5.478 

5.74 

Lincoln 

10 61 

5.774 

1177 -0 

6.120 

204.64 

5.735 
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and 1-1.5 b/pixel for the other images. Entropy coding 
techniques such as Huffman coding and arithmetic coding 
permit the lossless encoding of data close to entropy. 
Therefore, we can treat the entropy figures as estimates 
of the coding rates. For 512 x 512 images a savings of 
1-2 b/pixel translates to a savings of between 32 768 to 
65 536 bytes/image. For a large database of images this 
could be a considerable saving. As many remote sensing 
applications require large repositories of images, using 
sorted color maps can lead to a significant reduction in 
storage requirements. 

For comparison, the images were also compressed us- 
ing the Lempel-Ziv algorithm used in the GIF format [3]. 
The numbers shown in Table II were obtained after re- 
moving the overhead included in all GIF files. The per- 
formance of the Lempel-Ziv scheme is between 0.25 and 
1 b/pixel worse than the differential entropy of the sorted 
images. 


IV. Color-Map Sorting and Lossy Compression 



- - NASA’s earth observing system (EOS) will result in 

' ' even larger archives of remotely sensed images. In order 

w for remote users to easily access these images, a 

“browse” facility that allows the user to quickly access 
m low-resolution versions of the images is a necessity. Cur- 

LJ rently, in the global land information system (GLIS) the 

browse facility is implemented by only storing previously 
i 3 subsampled low-resolution versions of the images on-line. 

Browse features that allow on-line delivery of full reso- 
lution images can be implemented through the use of pro- 
gressive transmission. In most progressive transmission 
schemes, images compressed using lossy compression 
w techniques are first sent to the remote user. If the image 

is what the user is looking for, he or she can request that 
= ^ the image be refined by sending more information. For 

w standard pseudo-color images, lossy compression (with 

even little loss) would result in the destruction of most of 
the features of the image. This is evident from the image 
: ; in Fig. 5(a), where the three least-significant bits have 

been dropped from the image using unsorted color maps. 
_ The sorting of the color map restores some perceptual 

structure to the color-map indexes in the sense that in- 
w dexes close in numerical value are also close in some per- 

ceptual sense. Therefore, it should be possible to intro- 
duce errors into the indexes without destroying the image. 
To verify this hypothesis, we dropped the three least-sig- 
nificant bits of the L*H*t>*-sorted Omaha image. Good 
subjective results were obtained using quantization levels 
down to as low as 5 b/pixel from the 8-b original. Fig. 3 
' shows the result of quantizing the Omaha image to 

5 b/pixel, before and after the color map has been sorted. 
Notice that the image in Fig. 3(a), which used the un- 

— sorted color maps, displays severe distortion obscuring 
most of the image, while the image in Fig. 3(b), which 
used the sorted color map, suffered only minimial degra- 

_ dation. Several caveats are in order here. While the dis- 

tance between the 8-b indexes have more perceptual 


ta> 



(h) 

Fis. 3. (m Omaha image with un>oiieJ color map quantized to 5 b. (b) 
Omaha image w tth sorted color map quantized to 5 b. 

meaning after sorting, the sorted color-map image should 
not be assumed to have the same properties as an 8-b 
monochrome image. In some cases, if the distance be- 
tween the original and reconstructed (compressed and de- 
compressed) indexes i> large enough, there might be a 
drastic change in color between those pixels in the origi- 
nal and reconstructed image, which would be immedi- 
ately apparent. In the monochrome case large distances 
would correspond to changes in shading, which might be 
overlooked b\ the viewer. Also, if the original enoncolor- 
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mapped) images are available, better compression per- 
formance would be obtained by compressing the original 
image than by compressing the (sorted) color-mapped im- 
age. 

To see how well the soiled color-mapped images lend 
themselves to lossy compression we compress them using 
particular implementations of two popular lossy compres- 
sion techniques, the discrete cosine transform (DCT) and 
differential pulse code modulation (DPCM). 

A . DCT Coding of Color-Mapped Images 

In the DCT approach the image is divided into N x N 
blocks (N is typically 8). The blocks are then transformed 
using the DCT basis set. In the transform domain most of 
the energy is compacted into a few coefficients. The cod- 
ing resources (bits) are devoted to the coefficients with 
higher energy so a high-energy coefficient will be quan- 
tized with more bits, while a low-energy coefficient will 
be quantized with few or zero bits (i.e., discarded). At 
the receiver the quantized coefficients are transformed 
back to the spatial domain. The allocation of bits to the 
individual coefficients can be based on the average statis- 
tics of the image (or class of images) or on the character- 
istics of each individual block [5]. The latter approach is 
used in the recently approved JPEG standard for image 
compression [10]. 

In Fig. 4 we coded the Omaha image with the unsorted 
color map at 2 b/pixel using the JPEG algorithm. 1 The 
JPEG coding was applied to the index values leaving the 
color map intact. As can be seen from the Omaha image 
shown in the figure, the river is about the only thing still 
visible. It should be noted that for 8-b monochrome im- 
ages, DCT coding at 2 b/pixel generally provides a re- 
construction that is indistinguishable from the original. 

In Fig. 5 we show the same image, this time with the 
sorted color map, coded at 2 and 1 b/pixel using the JPEG 
algorithm. 

B. DPCM Coding of Color-Mapped Images 

The DPCM system consists of two main blocks, the 
quantizer and the predictor (see Fig. 9). The predictor uses 
the correlation between samples of the waveform s(k) to 
predict the next sample value. This predicted value is re- 
moved from the waveform at the transmitter and reintro- 
duced at the receiver. The prediction error is quantized to 
one of a finite number of values that is coded and trans- 
mitted to the receiver and is denoted by efk). The differ- 
ence between the prediction error and the quantized pre- 
diction error is called the quantization error, or the 
quantization noise. If the channel is error free, the recon- 
struction error at the receiver is simply the quantization 
error. To see this, note (Fig. 6) that the prediction error 
e(k) is' given by 

e{k) = s(k) - p(k) (4) 

’The JPEG coded images were coded using software from the indepen- 
dent JPEG group. 



where s(k) is original signal predicted by p(k), which is 
given by 

p(k) = S a q s(k - j) (5) 

s(k) = e q (k) + p(k). (6) 

Assuming an additive noise model, the quantized pre- 
diction error e q (k) can be represented as 

e q {k) = e{k) + n q (k) 0) 

where n q (k) denotes the quantization noise. The quantized 
prediction error is coded and transmitted to the receiver. 
If the channel is noisy, this is received as e q {k), which is 
given by 

e q (k) = e q {k) + n c (k) (8) 

where n c {k) represents the channel noise. The output of 
the receiver s(k) is thus given by 

s(k) = p{k) + e q (k) (9) 

p(k) = p(k) + n p {k) (10) 

the additional term n p (k) being the result of the introduc- 
tion of channel noise into the prediction process. Using 
(5), (8), (9), and (11) in (10) we obtain 

s(k) = s(k) + n q (k) + n c (k) + n p (k). (11) 

If the channel is error free, the last two terms in (8) drop 
out and the difference between the original and the recon- 
structed signal is simply the quantization error. 

When the prediction error is small, it falls into one of 
the inner levels of the quantizer, and the quantization noise 
is of a type referred to as granular noise. If the prediction 
error falls in one of the outer levels of the quantizer, the 
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Fig. 5 . la) Omaha image with sorted color map coded at l b/pixel using 
the JPEG algorithm, tb) Omaha image with sorted color map coded at 
2 b pixel using the JPEG algorithm. 


incurred quantization error is called overload noise. Gran- 
ular noise is generally smaller in magnitude than the 
overload noise and is hounded by the size of the quanti- 
zation interval. The overload noise, on the other hand, is 
essentially unbounded and can become very large depend- 
ing on the size of the prediction error. 

In the busy regions of images, especially edges, the 
prediction error is generally large, leading to large over- 
load noi.^e values. In monochrome images those noise val- 
ues result in a blurred appearance around edges, which 



Fig. 6. Block diagram of a DPCM system. 


may be acceptable for certain applications. However, in 
color-mapped images these noise values will result in 
splotches of different colors. The edge preserving DPCM 
(EPDPCM) system avoids this problem by the use of a 
recursively indexed quantizer [8], [9], 

For a given quantizer stepsize A and a positive integer 
K , define a, and x h as follows: 



x h = xi + {K - 1)A (12) 


where [_aJ is the largest integer not exceeding x. A re- 
cursively indexed quantizer of size K is a uniform quan- 
tizer with step size A (the uniform spacing both between 
the thresholds and between the output levels) and with x t 
and x h being its smallest and largest output levels ( Q de- 
fined this way always has 0 as an output level). The quan- 
tization rule Q is given as follows. For a given input value 
x we have the following: 

1) If a falls in the interval [x { + (A/2), x h - (A/2)], 
then Q(x) is the nearest output level. 

2) If a is greater than x ft — (A/2), see if X\ = x — x h 
e [x ( + (A/2), x h - (A/2)]. If so. Q(x) = [a,„ Q(: r,)]. If 
not, form a 2 = x - 2x h and do the same as for Xj. This 
process continues until for some m, x m = x - mx h falls 
in [A/ (A/2), x h - (A/2)], in which case a* will be quantized 
into 




( 13 ) 


3) If a is smaller than a , + (A/2), a similar procedure 
to this is used, i.e., x m = x — mxi is formed so that it falls 
in [a i + (A/2), x h - (A/2)], and is quantized to [x h x h 
> *Xj, Q(x ,„)] . 

In summary, the quantizer operates in two modes: it 
operates in one mode when the input falls in the range (a/, 
x h ) y and another when the input falls outside of the spec- 
ified range. 

The magnitude of the quantization error is therefore al- 
ways bounded by A/2. This attribute makes it ideal for 
application to the coding of color-mapped images. An- 
other advantage of the EPDPCM system is that as the 
quantizer output alphabet can be kept small without in- 
curring overload error, the output is amenable to entropy 
coding. 

Results using the EPDPCM system are shown in Fig. 
7. The images in Fig. 7(a) and (b) were coded at a rate of 
2 b/pixel and 1.37 b/pixel respectively. 

The advantage of DPCM systems over transform cod- 
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(b) 


Fig. 7. (a) Omaha image with sorted color map coded at 2 b pixel using 
EPDPCM <bi Omaha image -a ith sorted color map coded at 1.37 b/pixet 
using EPDPCM. 


V. Conclusion 

In this paper we have shown that the use ot sorted color 
maps makes color-mapped images of the type used by GIS 
amenable to both lossless and lossy compression. The 
sorting of the color maps can provide significant savings 
of resources for remote sensing image archives, while 
making lossy compression of color-mapped images pos- 
sible. The latter fact allows for the use of progressive 
transmission schemes with pseudo-color and color- 
mapped images. For lossy compression, conventional 
wisdom dictates the use of DCT coding for most types of 
images. However, for color-mapped images, DPCM cod- 
ing might be more advantageous. 
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ing systems is their low complexity and higher speed. 
""" However, the reconstruction quality obtained using trans- 
form codine svstems is generally significantly higher than 
that of DPCM systems at a given rate. Comparing Fig. 
— 7(a) and (b) with Fig. 5(a) and (b). this is obviously not 

the case for the sorted color-mapped images. In iact, the 
quality of the 2-b EPDPCM-coded image is actually 
__ somewhat higher than the 2-b DCT-coded image. Thus, 
using the EPDPCM swem provides advantages both in 
terms of complexity and speed, and reconstruction qual- 
ity. 
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